Skip to content

Conversation

@KirCute
Copy link
Member

@KirCute KirCute commented Jan 15, 2026

Description / 描述

基于antchfx/htmlquery爬取 autoindex 页面。由于各 web 服务器的 autoindex 页面各不相同,需要自己写 XPath。

Motivation and Context / 背景

https://github.com/orgs/OpenListTeam/discussions/170#discussioncomment-14943992

How Has This Been Tested? / 测试

测试环境1为 Apache/2.4.18,所用参数如下:

  • 条目://table/tbody/tr[position() > 2]
  • 文件名:./td[2]/a
  • 修改时间:./td[3]
  • 文件大小:./td[4]
  • 修改时间格式:2006-01-02 15:04

测试环境2为 nginx/1.29.0,所用参数如下:

  • 条目://pre/a
  • 文件名:.
  • 修改时间:substring(normalize-space(./following-sibling::text()[1]),1,17)
  • 文件大小:substring(normalize-space(./following-sibling::text()[1]),19)
  • 修改时间格式:02-Jan-2006 15:04

测试环境3(本地)为 Caddy/v2.10.2,所用参数如下:

  • 条目://table/tbody/tr
  • 文件名:./td[2]/a/span
  • 修改时间:./td[4]/time
  • 文件大小:./td[3]/div/div[2]
  • 忽略文件名:Up
  • 修改时间格式:01/02/2006 03:04:05 PM -07:00

测试环境4(本地)为 SimpleHTTP/0.6 Python/3.11.5,所用参数如下:

  • 条目://ul/li
  • 文件名:./a
  • 修改时间:没有,所以为空
  • 文件大小:没有,所以为空
  • 修改时间格式:没有修改时间,所以无所谓

测试环境5响应头里只写了 Apache 没写版本,所用参数如下:

  • 条目://pre/pre/a[position() > 4]
  • 文件名:.
  • 修改时间:substring(normalize-space(./following-sibling::text()[1]),1,16)
  • 文件大小:substring(normalize-space(./following-sibling::text()[1]),18)
  • 修改时间格式:2006-01-02 15:04

Checklist / 检查清单

  • I have read the CONTRIBUTING document.
    我已阅读 CONTRIBUTING 文档。
  • I have formatted my code with go fmt or prettier.
    我已使用 go fmtprettier 格式化提交的代码。
  • I have added appropriate labels to this PR (or mentioned needed labels in the description if lacking permissions).
    我已为此 PR 添加了适当的标签(如无权限或需要的标签不存在,请在描述中说明,管理员将后续处理)。
  • I have requested review from relevant code authors using the "Request review" feature when applicable.
    我已在适当情况下使用"Request review"功能请求相关代码作者进行审查。
  • I have updated the repository accordingly (If it’s needed).
    我已相应更新了相关仓库(若适用)。

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new "Autoindex" driver that can crawl and parse autoindex pages (directory listings) from web servers like Apache using XPath expressions. The driver uses the antchfx/htmlquery library to parse HTML content and extract file information.

Changes:

  • Added autoindex driver with configurable XPath expressions for parsing different autoindex formats
  • Implemented List and Link operations for browsing and accessing files from autoindex pages
  • Added size parsing utility that handles various unit formats (K, M, G, etc.)

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
go.mod Added dependencies for htmlquery and xpath libraries
go.sum Added checksums for new dependencies and their transitive dependency golang/groupcache
drivers/autoindex/util.go Implements size parsing from human-readable strings with unit conversion
drivers/autoindex/meta.go Defines driver configuration and registration with required XPath fields
drivers/autoindex/driver.go Core driver implementation with List and Link methods for fetching and parsing autoindex pages
drivers/all.go Registers the new autoindex driver with the driver registry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

xrgzs added 4 commits January 15, 2026 16:43
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
Signed-off-by: MadDogOwner <[email protected]>
xrgzs
xrgzs previously approved these changes Jan 15, 2026
Copy link
Member

@xrgzs xrgzs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM,文档加一下

@KirCute
Copy link
Member Author

KirCute commented Jan 15, 2026

LGTM,文档加一下

有空写

jyxjjj
jyxjjj previously approved these changes Jan 16, 2026
@xrgzs xrgzs merged commit f057846 into OpenListTeam:main Jan 17, 2026
12 checks passed
@KirCute KirCute deleted the feat/autoindex-driver branch January 17, 2026 12:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Module: Driver Driver-Related Issue/PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants